Unsupervised feature selection for large data sets

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach

Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...

متن کامل

Unsupervised Feature Selection for Text Data

Feature selection for unsupervised tasks is particularly challenging, especially when dealing with text data. The increase in online documents and email communication creates a need for tools that can operate without the supervision of the user. In this paper we look at novel feature selection techniques that address this need. A distributional similarity measure from information theory is appl...

متن کامل

Unsupervised feature selection for sparse data

Feature selection is a well-known problem in machine learning and pattern recognition. Many high-dimensional datasets are sparse, that is, many features have zero value. In some cases, we do not known the class label for some (or even all) patterns in the dataset, leading us to semi-supervised or unsupervised learning problems. For instance, in text classification with the bag-of-words (BoW) re...

متن کامل

Robust Unsupervised Feature Selection on Networked Data

Feature selection has shown its effectiveness to prepare high-dimensional data for many data mining and machine learning tasks. Traditional feature selection algorithms are mainly based on the assumption that data instances are independent and identically distributed. However, this assumption is invalid in networked data since instances are not only associated with high dimensional features but...

متن کامل

Parallelized Unsupervised Feature Selection for Large-Scale Network Traffic Analysis

In certain domains, where model interpretability is highly valued, feature selection is often the only possible option for dimensionality reduction. However, two key problems arise. First, the size of data sets today makes it unfeasible to run centralized feature selection algorithms in reasonable amounts of time. Second, the impossibility of labeling data sets rules out supervised techniques. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition Letters

سال: 2019

ISSN: 0167-8655

DOI: 10.1016/j.patrec.2019.08.017